Initialization of Iterative Refinement Clustering Algorithms

نویسندگان

  • Usama M. Fayyad
  • Cory Reina
  • Paul S. Bradley
چکیده

Iterative refinement clustering algorithms (e.g. K-Means, EM) converge to one of numerous local minima. It is known that they are especially sensitive to initial conditions. We present a procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution. The refined initial starting condition leads to convergence to “better” local minima. The procedure is applicable to a wide class of clustering algorithms for both discrete and continuous data. We demonstrate the application of this method to the Expectation Maximization (EM) clustering algorithm and show that refined initial points indeed lead to improved solutions. Refinement run time is considerably lower than the time required to cluster the full database. The method is scalable and can be coupled with a scalable clustering algorithm to address the large-scale clustering in data mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A robust iterative refinement clustering algorithm with smoothing search space

Iterative refinement clustering algorithms are widely used in data mining area, but they are sensitive to the initialization. In the past decades, many modified initialization methods have been proposed to reduce the influence of initialization sensitivity problem. The essence of iterative refinement clustering algorithms is the local search method. The big numbers of the local minimum points w...

متن کامل

Maximin Initialization for Cluster Analysis

Most iterative clustering algorithms require a good initialization to achieve accurate results. A new initialization procedure for all such algorithms is given that is exact when the data contain compact, separated clusters. Our examples use c-means clustering.

متن کامل

TR-2011002: Symbolic Lifting for Structured Linear Systems of Equations: Numerical Initialization, Nearly Optimal Boolean Cost, Variations, and Extensions

Hensel’s symbolic lifting for a linear system of equations and numerical iterative refinement of its solution have striking similarity. Combining the power of lifting and refinement seems to be a natural resource for further advances, but turns out to be hard to exploit. In this paper, however, we employ iterative refinement to initialize lifting. In the case of Toeplitz, Hankel, and other popu...

متن کامل

Symbolic Lifting for Structured Linear Systems of Equations: Numerical Initialization, Nearly Optimal Boolean Cost, Variations, and Extensions

Hensel’s symbolic lifting for a linear system of equations and numerical iterative refinement of its solution have striking similarity. Combining the power of lifting and refinement seems to be a natural resource for further advances, but turns out to be hard to exploit. In this paper, however, we employ iterative refinement to initialize lifting. In the case of Toeplitz, Hankel, and other popu...

متن کامل

An improved opposition-based Crow Search Algorithm for Data Clustering

Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998